Handling class imbalance in customer churn prediction
نویسندگان
چکیده
0957-4174/$ see front matter 2008 Elsevier Ltd. A doi:10.1016/j.eswa.2008.05.027 * Corresponding author. Tel.: +32 9 264 89 80; fax: E-mail address: [email protected] (D. Va URL: http://www.crm.UGent.be (D. Van den Poel). Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6 (1), 7–19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI’2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique. 2008 Elsevier Ltd. All rights reserved.
منابع مشابه
Customer churn prediction using improved balanced random forests
Churn prediction is becoming a major focus of banks in China who wish to retain customers by satisfying their needs under resource constraints. In churn prediction, an important yet challenging problem is the imbalance in the data distribution. In this paper, we propose a novel learning method, called improved balanced random forests (IBRF), and demonstrate its application to churn prediction. ...
متن کاملEnhancing the Performance of the Classifiers for Customer Churn Analysis in Telecommunication Data using EMOTE
Customer Churn is the term refers to the customers who are in threat to leave the company. Growing number of such customers are becoming critical for the telecommunication sector and the telecom sector are also in a situation to retain them to avoid the revenue loss. Prediction of such behaviour is very essential for the telecom sector and Classifiers proved to be the effective one for the same...
متن کاملHierarchical Alpha-cut Fuzzy C-means, Fuzzy ARTMAP and Cox Regression Model for Customer Churn Prediction
As customers are the main asset of any organization, customer churn management is becoming a major task for organizations to retain their valuable customers. In the previous studies, the applicability and efficiency of hierarchical data mining techniques for churn prediction by combining two or more techniques have been proved to provide better performances than many single techniques over a nu...
متن کاملNeighborhood Cleaning Rules and Particle Swarm Optimization for Predicting Customer Churn Behavior in Telecom Industry
Churn prediction is an important task for Customer Relationship Management (CRM) in telecommunication companies. Accurate churn prediction helps CRM in planning effective strategies to retain their valuable customers. However, churn prediction is a complex and challenging task. In this paper, a hybrid churn prediction model is proposed based on combining two approaches; Neighborhood Cleaning Ru...
متن کاملTime-sensitive Customer Churn Prediction based on PU Learning
With the fast development of Internet companies throughout the world, customer churn has become a serious concern. To better help the companies retain their customers, it is important to build a customer churn prediction model to identify the customers who are most likely to churn ahead of time. In this paper, we propose a Timesensitive Customer Churn Prediction (TCCP) framework based on Positi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 36 شماره
صفحات -
تاریخ انتشار 2009